In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from numba import jit # Use it for speed
from scipy import stats
In [2]:
def f(x):
return x**2 + 3*x + np.log(x)
step= 0.001
x = np.arange(1,5+step*0.1,step)
y = f(x)
print x.min(), x.max()
print y.min(), y.max()
plt.plot(x, y, lw=2., color="r")
plt.fill_between(x, 0, y, color="r", alpha=0.5)
plt.axhline(y=0, lw=1., color="k", linestyle="--")
plt.axhline(y=y.max(), lw=1., color="k", linestyle="--")
plt.axvline(x=x.min(), lw=1., color="k", linestyle="--")
plt.axvline(x=x.max(), lw=1., color="k", linestyle="--")
plt.xlabel("x")
plt.ylabel("y")
plt.title("$f(x) = x^2 + 3x + \ln{x}, x\in[1,5]$")
Out[2]:
Concretely, we are interested in knowing the area of the red-shaded region in the above figure. Furthermore, I have also provided a rectangular bounding box for the range of values of $x$ and $y$. The true value of the area under the curve is $\sim{81.381}$ using its analytic integral formula (see http://www.wolframalpha.com/input/?i=integrate+x%5E2+%2B+3x+%2B+ln(x),+x+in+%5B1,5%5D).
The most accurate way to get the value of the area is to find the value of the definite integral $\int_{1}^{5} f(x) dx$. However, in many cases analytically finding this integral is very tough, especially if the function is not easily integrable. This is where numerical methods for approximating the integral come handy. Monte Carlo (MC) techniques are one of the most popular form of numerical solution used for definite integral calculation.
A basic intuition of the Monte Carlo Integration is as follows:
Below, we will try to approximate the area of the curve using the MC integration method described above. We will use $N = 10^5$, and plot the points which fall in the region of the area in red and the other points in grey.
In [3]:
@jit
def get_MC_area(x, y, f, N=10**5, plot=False):
x_rands = x.min() + np.random.rand(N) * (x.max() - x.min())
y_rands = np.random.rand(N) * y.max()
y_true = f(x_rands)
integral_idx = (y_rands <= y_true)
if plot:
plt.plot(x_rands[integral_idx], y_rands[integral_idx],
alpha=0.3, color="r", linestyle='none',
marker='.', markersize=0.5)
plt.plot(x_rands[~integral_idx], y_rands[~integral_idx],
alpha=0.3, color="0.5", linestyle='none',
marker='.', markersize=0.5)
plt.axhline(y=0, lw=1., color="k", linestyle="--")
plt.axhline(y=y.max(), lw=1., color="k", linestyle="--")
plt.axvline(x=x.min(), lw=1., color="k", linestyle="--")
plt.axvline(x=x.max(), lw=1., color="k", linestyle="--")
plt.xlabel("x")
plt.ylabel("y")
plt.title("$f(x) = x^2 + 3x + \ln{x}, x\in[1,5]; N=%s$" % N)
print "Proportion points in space: %.3f" % (integral_idx).mean()
area = (integral_idx).mean() * (
(x_rands.max() - x_rands.min()) * (y_rands.max() - y_rands.min())
)
return area
In [4]:
area = get_MC_area(x, y, f, N=10**5, plot=True)
print "Area is: %.3f" % area
As we can observe, the number of points which fall inside the region of interest, are proportional to the area of the region. The area however, marginally close to the true area of $81.38$. Let us also try with a higher value of $N=10^7$
In [5]:
area = get_MC_area(x, y, f, N=10**7, plot=True)
print "Area is: %.3f" % area
The above figure, shows that for $N=10^7$, the region covered by the sampled points is almost as smooth as the shaded region. Furthermore, the area is closer to the true value of $81.38$.
Now, let us also analyze, how the value of the calculated area changes with the order of number of sampled points.
In [6]:
for i in xrange(2,8):
area = get_MC_area(x, y, f, N=10**i, plot=False)
print i, area
Clearly, as the number of points increase, the area becomres closer to the true value.
Let us further examine this change by starting with $10^3$ points and then going all the way till $10^6$ points.
In [7]:
%%time
N_vals = 1000 + np.arange(1000)*1000
areas = np.zeros_like(N_vals, dtype="float")
for i, N in enumerate(N_vals):
area = get_MC_area(x, y, f, N=N, plot=False)
areas[i] = area
In [8]:
print "Mean area of last 100 points: %.3f" % np.mean(areas[-100:])
print "Areas of last 10 points: ", areas[-10:]
In [9]:
plt.plot(N_vals, areas, color="0.1", alpha=0.7)
plt.axhline(y=np.mean(areas[100:]), linestyle="--", lw=1., color="k")
plt.ylabel("Area")
plt.xlabel("Number of samples")
#plt.xscale("log")
Out[9]:
As we can observe from the figure above, for lower number of sampled points, the estimates of MC integration are quite noisy. However, for larger number of points this value converges to the true estimates.
In [ ]: